AITopics

2502.17378

Country:

South America > Brazil > Rio Grande do Norte > Natal (0.04)
Oceania > New Zealand > South Island > Otago > Dunedin (0.04)
North America > Canada > Ontario > Kingston (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Software (1.00)
Information Technology > Data Science (1.00)
(2 more...)

Shivashankar, Karthik, Martini, Antonio

MLScent A tool for Anti-pattern detection in ML projects

arXiv.org Artificial IntelligenceJan-30-2025

--Machine learning (ML) codebases face unprecedented challenges in maintaining code quality and sustainability as their complexity grows exponentially. While traditional code smell detection tools exist, they fail to address ML-specific issues that can significantly impact model performance, reproducibility, and maintainability. This paper introduces MLScent, a novel static analysis tool that leverages sophisticated Abstract Syntax Tree (AST) analysis to detect anti-patterns and code smells specific to ML projects. MLScent implements 76 distinct detectors across major ML frameworks including T ensorFlow (13 detectors), PyT orch (12 detectors), Scikit-learn (9 detectors), and Hugging Face (10 detectors), along with data science libraries like Pandas and NumPy (8 detectors each). Our evaluation demonstrates MLScent's effectiveness through both quantitative classification metrics and qualitative assessment via user studies feedback with ML practitioners. Results show high accuracy in identifying framework-specific anti-patterns, data handling issues, and general ML code smells across real-world projects. The software development landscape has undergone a dramatic transformation with the integration of Machine Learning (ML). Recent statistics from Gartner highlight this shift, revealing a striking 270% increase in ML adoption within enterprise software projects over the last four years [1]. This rapid adoption, however, brings its own set of complexities. Traditional software development practices have had to evolve significantly to accommodate ML's unique requirements, including the need for extensive datasets, sophisticated algorithms, and iterative development cycles [3]. These fundamental differences have catalyzed a complete reimagining of software development methodologies, from initial design through testing and maintenance [4], [5] which is also highlighted by Tang et al. [6] in their empirical study of ML systems refactoring and technical debt. ML projects introduce distinct code quality challenges that set them apart from conventional software development. The complexity stems from their inherent characteristics: intricate mathematical operations, extensive data preprocessing requirements, and sophisticated model architectures that challenge traditional code maintenance approaches [7].

detection, effectiveness, mlscent, (16 more...)

2502.18466

Country:

North America > United States (0.04)
Europe > Norway > Eastern Norway > Oslo (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > New Finding (0.88)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.46)

arXiv.org Artificial IntelligenceDec-16-2024

"They've Stolen My GPL-Licensed Model!": Toward Standardized and Transparent Model Licensing

Duan, Moming, Zhao, Rui, Jiang, Linshan, Shadbolt, Nigel, He, Bingsheng

As model parameter sizes reach the billion-level range and their training consumes zettaFLOPs of computation, components reuse and collaborative development are become increasingly prevalent in the Machine Learning (ML) community. These components, including models, software, and datasets, may originate from various sources and be published under different licenses, which govern the use and distribution of licensed works and their derivatives. However, commonly chosen licenses, such as GPL and Apache, are software-specific and are not clearly defined or bounded in the context of model publishing. Meanwhile, the reused components may also have free-content licenses and model licenses, which pose a potential risk of license noncompliance and rights infringement within the model production workflow. In this paper, we propose addressing the above challenges along two lines: 1) For license analysis, we have developed a new vocabulary for ML workflow management and encoded license rules to enable ontological reasoning for analyzing rights granting and compliance issues. 2) For standardized model publishing, we have drafted a set of model licenses that provide flexible options to meet the diverse needs of model publishing. Our analysis tool is built on Turtle language and Notation3 reasoning engine, envisioned as a first step toward Linked Open Model Production Data. We have also encoded our proposed model licenses into rules and demonstrated the effects of GPL and other commonly used licenses in model publishing, along with the flexibility advantages of our licenses, through comparisons and experiments.

artificial intelligence, machine learning, natural language, (19 more...)

2412.11483

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > Finland (0.04)
North America > United States > New Jersey (0.04)
(5 more...)

Genre: Research Report (0.64)

Industry: Law > Intellectual Property & Technology Law (1.00)

Technology:

Information Technology > Communications > Web (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Bernardo, João Helis, da Costa, Daniel Alencar, de Medeiros, Sérgio Queiroz, Kulesza, Uirá

How do Machine Learning Projects use Continuous Integration Practices? An Empirical Study on GitHub Actions

arXiv.org Artificial IntelligenceMar-14-2024

Continuous Integration (CI) is a well-established practice in traditional software development, but its nuances in the domain of Machine Learning (ML) projects remain relatively unexplored. Given the distinctive nature of ML development, understanding how CI practices are adopted in this context is crucial for tailoring effective approaches. In this study, we conduct a comprehensive analysis of 185 open-source projects on GitHub (93 ML and 92 non-ML projects). Our investigation comprises both quantitative and qualitative dimensions, aiming to uncover differences in CI adoption between ML and non-ML projects. Our findings indicate that ML projects often require longer build durations, and medium-sized ML projects exhibit lower test coverage compared to non-ML projects. Moreover, small and medium-sized ML projects show a higher prevalence of increasing build duration trends compared to their non-ML counterparts. Additionally, our qualitative analysis illuminates the discussions around CI in both ML and non-ML projects, encompassing themes like CI Build Execution and Status, CI Testing, and CI Infrastructure. These insights shed light on the unique challenges faced by ML projects in adopting CI practices effectively.

ml and non-ml project, ml project, non-ml project, (12 more...)

2403.09547

Country:

Europe > Portugal > Lisbon > Lisbon (0.05)
South America > Brazil > Rio Grande do Norte > Natal (0.04)
North America > United States > Virginia (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Software (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

#artificialintelligenceMar-24-2023, 12:55:04 GMT

successful-machine-learning-development-requires-a-new-paradigm-thought-leaders

Initiatives using machine learning cannot be treated in the same manner as projects involving conventional software. It's imperative to move quickly so that you can test things, fix issues and test them again. In other words, you must be able to fail quickly – and do so early on in the process. Waiting until later in this process to find issues can end up being very expensive and time-consuming. When developing software using the traditional method, you use decision logic.

artificial intelligence, machine learning, software, (11 more...)

Country: North America > United States > New York (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.40)
Information Technology > Artificial Intelligence > Cognitive Science > Creativity & Intelligence (0.40)

#artificialintelligenceMar-8-2023, 20:10:51 GMT

Enterprise ML Platforms Done Right

Many companies are attempting to speed up the delivery of their machine learning (ML) projects by creating platforms. While a few have succeeded, some have experienced significant failures, and most have ended up somewhere in the middle. This can happen when they address MLOps without first addressing their organizational structure and operating model. In this article, we will explore common pitfalls enterprises encounter when building ML platforms and provide solutions to help overcome these obstacles. We will tackle five common pitfalls enterprises face when getting their platform up and running and propose prescriptive solutions for each. To simplify the language, we will use the term "you" to refer to the team responsible for building and maintaining the platform.

delivery, platform, platform team, (13 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.34)

#artificialintelligenceFeb-17-2023, 03:01:58 GMT

New at Civo Navigate: Making Machine Learning Set up Faster - The New Stack

Of the time it takes to set up a machine learning project, 60% is actually spent performing infrastructure engineering tasks. That compares to 20% doing data engineering, Civo Chief Innovation Officer Josh Mesout, who has launched 300 machine learning (ML) models in the past two and a half years, said at the Civo Navigate conference here on Tuesday. Civo hopes to simplify machine learning infrastructure with a new managed service offering, Kubeflow as a Service, which it says will improve the developer experience and reduce the time and resources required to gain insights from machine learning algorithms. The Kubernetes cloud provider is betting that developers don't want to deal with the infrastructure piece of the ML puzzle. So its new offering will run the infrastructure for ML as a managed service, while supporting open source tools and frameworks. It believes this will make ML more accessible to smaller organizations, which it said are often priced out of ML due to economies of scale.

civo, cloud provider, majrekar, (13 more...)

Country:

North America > United States > New York (0.05)
North America > United States > Florida > Hillsborough County > Tampa (0.05)
Europe > Germany > Hesse > Darmstadt Region > Frankfurt (0.05)

Industry:

Information Technology > Services (0.96)
Government > Regional Government > North America Government > United States Government (0.30)

Technology:

Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

#artificialintelligenceFeb-7-2023, 14:40:20 GMT

What Most People Don't Understand About AI - and The Ultimat

In other words, to say that artificial intelligence (AI) is the next step in enterprise would be an understatement. But while it is well known that AI is the next step forward, myths and misconceptions about AI and its processes still run rampant. In order for AI and ML to be used to their maximum potential to help streamline enterprise, reduce costs, reduce risk and increase profits, it needs to be implemented with precision by those with realistic expectations. In 2019, Techopedia ran a two-part survey and quiz to help us examine how well industry executives comprehend AI and machine learning (ML). The results of our survey supported one clear answer: Business and industry executives do not understand the majority of AI and ML.

ai and ml, ai project, intelligence, (17 more...)

Industry: Information Technology (0.48)

Technology: Information Technology > Artificial Intelligence > Applied AI (1.00)

#artificialintelligenceNov-11-2022, 08:45:57 GMT

KID, DataRobot partnership makes data science accessible to every business

Amid soaring demand for tools to enable the data-driven organisation, a partnership between data specialists Knowledge Integration Dynamics (KID) and global AI cloud leader DataRobot is automating and democratising artificial intelligence (AI) and machine learning (ML), putting it into the hands of more South African businesses. Markus Top, who is heading up the partnership at KID, says it is a logical next step for KID, which has supported South African enterprises through their data journey for over 20 years. "Every business today wants to be data driven and embed AI at scale. However, until fairly recently achieving this has been a costly and time-consuming task," Top says. "With DataRobot, the manual, time-consuming processes within AI and ML projects are largely automated, allowing businesses to transform and innovate faster."

data scientist, datarobot, datarobot partnership make data science, (8 more...)

Country:

Africa > South Africa > Western Cape > Cape Town (0.05)
Africa > South Africa > Gauteng > Pretoria (0.05)
Africa > South Africa > Gauteng > Johannesburg (0.05)

Technology:

Information Technology > Communications > Social Media (0.50)
Information Technology > Artificial Intelligence > Machine Learning (0.35)
Information Technology > Data Science > Data Mining (0.32)

#artificialintelligenceOct-24-2022, 16:22:50 GMT

The Most Fundamental Layer of MLOps -- Required Infrastructure

In my previous post, I have discussed the three key components to build an end-to-end MLOps solution, which are data and feature engineering pipelines, ML model training, and retraining pipeline ML model serving pipelines. You can find the article here: Learn the core of MLOPS -- Building ML Pipelines. At the end of my last post, I briefly talked about the fact that the complexities of MLOps solutions can vary significantly from one to another, depending on the nature of the ML project, and more importantly, variations of the underlying infrastructure required. Therefore in today's post, I will explain how the different levels of Infrastructure required, determine the complexities of MLOps solutions, as well as categorize MLOPS solutions into different levels. More importantly, in my view, categorizing MLOps into different levels makes it easier for organizations of any size to adopt MLOps.

infrastructure, mlop solution, required infrastructure, (7 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.83)